{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-colab"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/pathwaycom/pathway/blob/main/examples/notebooks/tutorials/transformer_tree_example.ipynb\" target=\"_parent\"><img src=\"https://pathway.com/assets/colab-badge.svg\" alt=\"Run In Colab\" class=\"inline\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Installing Pathway with Python 3.10+\n",
        "\n",
        "In the cell below, we install Pathway into a Python 3.10+ Linux runtime.\n",
        "\n",
        "> **If you are running in Google Colab, please run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning.\n",
        "> \n",
        "> **The installation and loading time is less than 1 minute**.\n"
      ],
      "metadata": {
        "id": "notebook-instructions"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "%%capture --no-display\n",
        "!pip install --prefer-binary pathway"
      ],
      "metadata": {
        "id": "pip-installation-pathway",
        "cellView": "form"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "id": "1",
      "metadata": {},
      "source": [
        "# Implementing a tree using transformer classes\n",
        "A more advanced example on how transformer classes work by implementing a tree.\n",
        "\n",
        "Pathway's transformer class is a powerful tool to perform advanced operations on tables.\n",
        "In the following, we are going to show you how to use a transformer class to implement a tree and compute recursive operations on it.\n",
        "\n",
        "We strongly advise you to read our [introduction on transformer classes](/developers/user-guide/diving-deeper/transformer-introduction/) and the [simple examples](/developers/user-guide/diving-deeper/transformer-example/) before reading further.\n",
        "\n",
        "## Pathway Data Structures \\& Algorithms 101: How to represent a Tree?\n",
        "\n",
        "Let's take a look at one of the simplest graph-like data structures: a tree. Let's encode tree nodes into a table with columns:\n",
        "\n",
        "1. Node ID\n",
        "2. A value `val` of integer type, stored in nodes of the tree.\n",
        "3. The node's parent ID - which can be Null for the root.\n",
        "\n",
        "To do this, in Pathway you can write the following schema for the considered table (ID's are implicit and don't need to be defined)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2",
      "metadata": {
        "lines_to_end_of_cell_marker": 2
      },
      "outputs": [],
      "source": [
        "from __future__ import annotations\n",
        "\n",
        "from typing import Optional\n",
        "\n",
        "import pathway as pw\n",
        "\n",
        "\n",
        "class Nodes(pw.Schema):\n",
        "    val: int\n",
        "    parent: Optional[pw.Pointer[Nodes]]"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "source": [
        "## Transformer Classes acting on a single row\n",
        "\n",
        "You would now like to compute some basic statistics on the tree. For example, is a given node the root? In Python, this would follow through a simple row operation:\n",
        "\n",
        "```py\n",
        "    # We would want to add this logic as a \"method\" to the `nodes` schema\n",
        "\n",
        "    def is_root(self):\n",
        "        return self.parent is None\n",
        "```\n",
        "\n",
        "How to make a transformer which takes a table following the schema `nodes` and \"gives it\" the method above? The answer is a Transformer Class which acts on a single table argument called `nodes`, and adds the `is_root` logic as an output argument. We call our transformer `tree_node_roots`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4",
      "metadata": {},
      "outputs": [],
      "source": [
        "@pw.transformer\n",
        "class tree_node_roots:\n",
        "    class nodes(pw.ClassArg, input=Nodes):\n",
        "        val = pw.input_attribute()\n",
        "        parent = pw.input_attribute()\n",
        "\n",
        "        @pw.output_attribute\n",
        "        def is_root(self):\n",
        "            return self.parent is None"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "5",
      "metadata": {},
      "source": [
        "Let's provide a quick explanation of what happens here.\n",
        "You can specify `Nodes` as input for the class `nodes` to enforce that the rows of the table are of type `Nodes`.\n",
        "You link the parameters of the class `nodes` to the ones of `Nodes` with the `pw.input_attribute()` function. Note that the names of the parameters (`val` and `parent` in the example) must be exactly the same as the column names of the input table.\n",
        "Finally, you declare the different columns of the resulting table using the annotation `pw.output_attribute` on different functions. Each function defines a column in the output table and the value of the function is going to be used to as the value: the name of the function defines the name of the column.\n",
        "\n",
        "You can now use `tree_node_roots` as a transformer, and call `tree_node_roots(TN)` for a table `TN` of nodes to get the required output columns, just as you would for any other transformer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6",
      "metadata": {
        "lines_to_end_of_cell_marker": 2
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "[2024-06-04T06:35:13]:INFO:Preparing Pathway computation\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "[2024-06-04T06:35:13]:INFO:Preparing Pathway computation\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | val | parent_label | parent\n",
            "^X1MXHYY... | 0   |              |\n",
            "^YYY4HAB... | 1   | 0            | ^X1MXHYY...\n",
            "^Z3QWT29... | 2   | 0            | ^X1MXHYY...\n",
            "^3CZ78B4... | 3   | 1            | ^YYY4HAB...\n",
            "^3HN31E1... | 4   | 1            | ^YYY4HAB...\n",
            "^3S2X6B2... | 5   | 2            | ^Z3QWT29...\n",
            "^A984WV0... | 6   | 2            | ^Z3QWT29...\n",
            "            | is_root\n",
            "^YYY4HAB... | False\n",
            "^Z3QWT29... | False\n",
            "^3CZ78B4... | False\n",
            "^3HN31E1... | False\n",
            "^3S2X6B2... | False\n",
            "^A984WV0... | False\n",
            "^X1MXHYY... | True\n"
          ]
        }
      ],
      "source": [
        "tree = pw.debug.table_from_markdown(\n",
        "    \"\"\"\n",
        "    | val | parent_label\n",
        " 0  | 0    |\n",
        " 1  | 1    | 0\n",
        " 2  | 2    | 0\n",
        " 3  | 3    | 1\n",
        " 4  | 4    | 1\n",
        " 5  | 5    | 2\n",
        " 6  | 6    | 2\n",
        " \"\"\"\n",
        ")\n",
        "tree += tree.select(parent=tree.pointer_from(tree.parent_label, optional=True))\n",
        "pw.debug.compute_and_print(tree)\n",
        "\n",
        "result = tree_node_roots(tree).nodes\n",
        "pw.debug.compute_and_print(result)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7",
      "metadata": {
        "lines_to_next_cell": 2
      },
      "source": [
        "## Transformer Classes acting on multiple rows\n",
        "\n",
        "Now, let's try something which shows the power of Pathway a bit more. Suppose you would like to see how many steps away a node is from its root. Let's call this the `level` of a node. How would you compute this?\n",
        "\n",
        "Logically, the `level` of a node is higher by 1 unit than the `level` of its parent. So, the solution can be obtained by recursion.\n",
        "\n",
        "Recursion is perhaps something you would think twice about before [attempting in SQL](https://medium.com/swlh/recursion-in-sql-explained-graphically-679f6a0f143b). In Pathway, recursion is natively supported, and efficient to use where the \"recursion stack\" does not change much for old data rows as new data arrives.\n",
        "\n",
        "The transformer which does just what we want is provided below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8",
      "metadata": {},
      "outputs": [],
      "source": [
        "@pw.transformer\n",
        "class tree_node_roots_and_levels:\n",
        "    class nodes(pw.ClassArg, input=Nodes):\n",
        "        val = pw.input_attribute()\n",
        "        parent = pw.input_attribute()\n",
        "\n",
        "        @pw.output_attribute\n",
        "        def is_root(self):\n",
        "            return self.parent is None\n",
        "\n",
        "        @pw.output_attribute\n",
        "        def level(self):\n",
        "            if self.is_root:\n",
        "                return 0\n",
        "            else:\n",
        "                return 1 + self.transformer.nodes[self.parent].level"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9",
      "metadata": {},
      "source": [
        "Most of the logic is contained in the final line, `1 + self.transformer.nodes[self.parent].level`.\n",
        "\n",
        "You obtain the following table:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "10",
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "[2024-06-04T06:35:13]:INFO:Preparing Pathway computation\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "            | is_root | level\n",
            "^YYY4HAB... | False   | 1\n",
            "^Z3QWT29... | False   | 1\n",
            "^3CZ78B4... | False   | 2\n",
            "^3HN31E1... | False   | 2\n",
            "^3S2X6B2... | False   | 2\n",
            "^A984WV0... | False   | 2\n",
            "^X1MXHYY... | True    | 0\n"
          ]
        }
      ],
      "source": [
        "result = tree_node_roots_and_levels(tree).nodes\n",
        "pw.debug.compute_and_print(result)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "11",
      "metadata": {},
      "source": [
        "A small side note: you might simply have wanted to write here `1 + self.parent.level` instead, however, this would be missing information about the table that `self.parent` lives in. This table is identified through `self.transformer.nodes`.\n",
        "\n",
        "Though making the syntax a bit more verbose, identifying objects through both a table, and a row identifier, helps to avoid confusion.\n",
        "\n",
        "You will see why this is useful in this [article](/developers/user-guide/diving-deeper/transformer-example/) where we introduce Transformer Classes that use not just one, but two or more arguments. These will allow us to work with a `matchings` table and a `profiles` table, indicating a pair of nodes for which the required computation should be performed.\n",
        "\n",
        "\n",
        "## Conclusion\n",
        "\n",
        "In this guide, you learned how to write transformer classes building a tree and computing some basic operations on that tree. This is useful for defining row-based logic for tables, oblivious of the fact that we are operating on top of data streams.\n",
        "You can take a look at our [tour of Pathway's transformers](/developers/user-guide/diving-deeper/transformer-example/) in which you will find a list of examples of transformers.\n",
        "\n",
        "You can also check our [connectors](/developers/user-guide/connect/pathway-connectors/) to connect your data into Pathway."
      ]
    }
  ],
  "metadata": {
    "jupytext": {
      "cell_metadata_filter": "-all",
      "main_language": "python",
      "notebook_metadata_filter": "-all"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}